Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes
نویسندگان
چکیده
We propose a general approach to accelerate the convergence of the widely used solution methods of Markov decision processes. The approach is inspired by the monotone behavior of the contraction mappings in the feasible set of the linear programming problem equivalent to the MDP. Numerical studies show that the computational savings can be significant especially in the cases where the standard value iteration suffers from slow convergence. The same acceleration approach can be used in other types of MDPs. This paper sheds light on a new research avenue of combining the two-well studied theories of optimization linear programing and Markov decision processes to hope for surmounting the curse of dimensionality.
منابع مشابه
Acceleration Operators in the Value Iteration Algorithms for Average Reward Markov Decision Processes
One of the most widely used methods for solving average cost MDP problems is the value iteration method. This method, however, is often computationally impractical and restricted in size of solvable MDP problems. We propose acceleration operators that improve the performance of the value iteration for average reward MDP models. These operators are based on two important properties of Markovian ...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملFirst Order Markov Decision Processes A Dissertation submitted
Relational Markov Decision Processes (RMDP) are a useful abstraction for complex reinforcement learning problems and stochastic planning problems since one can develop abstract solutions for them that are independent of domain size or instantiation. This thesis develops compact representations for RMDPs and exact solution methods for RMDPs using such representations. One of the core contributio...
متن کاملEmpirical Dynamic Programming
We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get ‘empirical value iteration’ (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get ‘empirical policy iter...
متن کاملA Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes
We present a technique for speeding up the convergence of value iteration for par tially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov decision processes (MDPs). The technique can be easily incor porated into any existing POMDP value it eration algorithms. Experiments have been conducted on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Operations Research
دوره 58 شماره
صفحات -
تاریخ انتشار 2010